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1.  Problem  Studied 


The  basic  objective  of  this  project  has  been  to  consider  a  large  class  of  matrix  com¬ 
putations  with  particular  emphasis  to  algorithms  which  can  be  implemented  on  arrays  of 
processors.  In  particular,  we  have  been  interested  in  methods  which  axe  useful  for  sparse 
matrix  computations.  These  computations  arise  in  a  variety  of  applications  such  as  the 
solution  of  partial  differential  equations  by  multigrid  methods  and  in  the  fitting  of  geodetic 
data.  Some  of  the  methods  developed  have  already  found  their  use  on  some  of  the  newly 
developed  architectures  (see  below). 

2.  Summary  of  Important  Results 

Five  reports  and  papers  have  been  written  during  the  duration  of  this  grant.  We 
describe  some  of  the  results  given  in  these  reports.  A  complete  list  of  the  reports  and 
papers  is  given  in  Section  3  of  this  report. 


The  serial  multigrid  algorithm  is  a  fast  and  efficient  technique  for  solving  elliptic 
partial  differential  equations.  The  algorithm  consists  of  “solving”  a  series  of  problems  on 
a  hierarchy  of  grids  with  different  mesh  sizes.  For  many  problems,  it  is  possible  to  prove 
that  its  execution  time  is  asymptotically  optimal.  Not  only  is  it  asymptotically  optimal 


but  when  properly  implemented  it  is  competitive  with  other  algorithms  on  grids  of  a 
modest  size.  Given  its  success  on  serial  computers,  it  is  natural  to  consider  its  performance 
characteristics  on  parallel  machines. 


Primarily,  this  research  considers  the  mapping  of  the  multigrid  algorithm  to  the  dis-  - 

tribute*!  memory,  message  passing  hypercube  computer.  The  work  illustrates  how  the 


topology  of  the  hypercube  fits  the  data  flow  of  the  multigrid  algorithm,  and  therefore  al  . 
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lows  parallels  implementations  with  relatively  low  communication  cost.  It  has  been  shown 
that  the  multigrid  algorithm  is  an  asymptotically  optimal  parallel  algorithm  in  a  certain 
sense.  A  timing  model  for  the  execution  time  of  a  particular  implementation  was  developed 
and  found  to  accurately  model  experimental  results  obtained  from  runs  on  the  Intel  iPSC 
system.  Further,  this  model  was  used  to  explore  the  influence  of  machine  and  algorithm 
parameters  on  the  efficiency  of  the  method. 

One  difficulty  with  the  parallel  muitigrid  algorithm  is  a  load  balancing  problem  that 
creates  inefficiency  on  large  processor  systems  (caused  by  processors  becoming  idle  on 
coarse  grids).  The  current  research,  which  is  described  in  [1],  is  concerned  with  evaluating 
the  magnitude  of  this  problem  and  developing  new  algorithms  which  do  not  have  these 
difficulties.  One  new  algorithm  exploits  idle  processors  to  accelerate  the  convergency  of 
the  basic  multigrid  method.  Additional  work  is  necessary  to  fully  evaluate  this  algorithm: 
however,  the  preliminary  analysis  and  experiments  are  promising. 

The  possibilities  of  systolic-like  architectures  (n  x  n  grids  of  relatively  simple  and  small 
processors)  has  been  demonstrated  in  performing  the  direct  sparse  Cholesky  factorization 
of  a  positive  definite  matrix  in  [5).  These  matrices  arise  in  the  discretization  of  elliptic, 
partial  differential  equations  by  finite  elements  or  finite  differences.  The  factorization  and 
backsolve,  each  require  0(n)  parallel  floating  point  multiplications,  realizing  the  theoretical 
parallel  execution  times  previously  determined  abstractly,  without  means  of  implement  a 
tion.  The  algorithm  described  here  has  been  the  basis  for  the  nested  dissection  program 
developed  for  the  connection  machine:  its  implementation  has  been  quite  successful. 

Several  algorithms  have  been  developed,  which  are  particularly  appropriate  for  vector 


architectures.  In  particular,  two  problems  have  been  studied  which  are  of  a  statistical 
nature:  the  computation  of  variances  for  large  data  samples  and  the  geodetic  data  fiting 
problem. 

The  problem  of  computing  the  variance  of  a  sample  of  N  data  points  may  be  difficult  for 
certain  data  sets,  particularly  when  N  is  large  and  the  variance  is  small.  In  [1],  we  studied 
several  algorithms  and  their  round-off  error  bounds.  We  presented  a  new  algorithm  which 
is  highly  efficient  in  a  vector  environment  and  which  has  excellent  numerical  properties. 

In  [3],  we  have  described  and  compared  some  numerical  methods  for  solving  large 
dimensional  linear  least  squares  problems  that  arise  in  geodesy  and,  more  specially,  from 
Doppler  positioning.  The  methods  that  are  considered  are  the  direct  orthogonal  decom¬ 
position,  and  the  combination  of  conjugate  gradient  type  algorithms  with  projections  as 
well  as  the  exploitation  of  “Property  A”.  Numerical  results  are  given  and  the  respective 
advantage  of  the  methods  are  discussed  with  respect  to  such  parameters  as  CPU  time, 
input/output  and  storage  requirements. 

Iterative  methods  are  often  used  for  solving  the  linear  systems  arising  from  the  ap¬ 
proximation  to  elliptic  partial  differential  equations.  The  Chebyshev  and  second-order 
Richardson  methods  are  classical  iterative  schemes  for  solving  such  systems.  We  consider 
in  [4].  the  convergence  analysis  of  these  methods  when  each  step  of  the  iteration  is  carried 
out  inexactly.  This  has  many  applications,  since  a  preconditioned  iteration  requires,  at 
each  step,  the  solution  of  a  linear  system  which  may  be  solved  inexactly  using  an  "inner" 
iteration.  We  have  also  derived  an  error  bound  which  applies  to  the  general  nonsymmetric 
inexact  Chebyshev  iteration.  In  particular,  in  domain  decomposition  (or  substructuring  I. 
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it  may  be  desirable  to  solve  the  subsystem  approximately. 
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